feat(observability): add runner VM hostmetrics Grafana dashboard#187
Draft
feat(observability): add runner VM hostmetrics Grafana dashboard#187
Conversation
Adds a read-only Grafana dashboard (editable: false) for runner VM host-level metrics to be served via cos-configuration-k8s using the grafana-dashboard relation, which provisions it as an immutable filesystem dashboard in Grafana. The dashboard covers: - CPU utilisation by state and load averages - Memory usage by state - Disk I/O throughput and operations - Filesystem usage % by mount point - Network traffic, errors and drops Template variables: - github_job_id: filter by GitHub Actions workflow run job ID - instance: filter by runner hostname Metric names follow the OpenTelemetry hostmetrics receiver prometheus convention (e.g. system_cpu_time_seconds_total). The github_job_id label is expected to be set as a resource attribute by the otelcol pipeline collecting metrics from the runner VMs. Related: ISD-5152
Rename grafana_dashboards/ to runner_grafana_dashboards/ to make the purpose explicit at the repo root level (runner VM host metrics, not charm workload metrics). Update README with: - Repository layout overview - Observability section explaining the cos-configuration-k8s delivery mechanism and the immutability guarantee - Table of conventions for where dashboards live and what grafana_dashboards_path value to use in Terraform
Replace github_job_id with github_job and instance with github_runner to match the actual attribute labels set by the pre-job OTel config (see canonical/github-runner-operator#781). Add github_repository and github_workflow template variables so the dashboard can be filtered the same way as the existing PS6 hostmetrics dashboard.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a read-only Grafana dashboard for runner VM host-level metrics, served via
cos-configuration-k8susing thegrafana-dashboardrelation. Provisioned dashboards are immutable in Grafana (filesystem-based), so they cannot be edited regardless of user role.Changes
runner_grafana_dashboards/runner_vm_hostmetrics.json: new dashboard covering CPU, memory, disk I/O, filesystem, network traffic and load averagesgithub_job_id(filter by workflow run) andinstance(filter by hostname)editable: false+__inputsdatasource declarationREADME.md: documents the repo layout and the observability dashboard delivery mechanismNotes
github_job_idlabel is expected to be set as a resource attribute by the otelcol pipeline — confirm the exact label name once that pipeline is wired upcos-configuration-k8s) is in platform-engineering-deployments feat/runner-hostmetrics-cos-configurationCloses / relates to: ISD-5152